Large language models can perform new tasks in a zero-shot fashion, given natural language prompts that specify the desired behavior. Such prompts are typically hand engineered, but can also be learned with gradient-based methods from labeled data. However, it is underexplored what factors make the prompts effective, especially when the prompts are natural language. In this paper, we investigate common attributes shared by effective prompts. We first propose a human readable prompt tuning method (F LUENT P ROMPT) based on Langevin dynamics that incorporates a fluency constraint to find a diverse distribution of effective and fluent prompts. Our analysis reveals that effective prompts are topically related to the task domain and calibrate the prior probability of label words. Based on these findings, we also propose a method for generating prompts using only unlabeled data, outperforming strong baselines by an average of 7.0% accuracy across three tasks.
translated by 谷歌翻译
In this work, we explore a useful but often neglected methodology for robustness analysis of text generation evaluation metrics: stress tests with synthetic data. Basically, we design and synthesize a wide range of potential errors and check whether they result in a commensurate drop in the metric scores. We examine a range of recently proposed evaluation metrics based on pretrained language models, for the tasks of open-ended generation, translation, and summarization. Our experiments reveal interesting insensitivities, biases, or even loopholes in existing metrics. For example, we find that BERTScore ignores truncation errors in summarization, and MAUVE (built on top of GPT-2) is insensitive to errors at the beginning of generations. Further, we investigate the reasons behind these blind spots and suggest practical workarounds for a more reliable evaluation of text generation.
translated by 谷歌翻译
Large pretrained language models generate fluent text but are notoriously hard to controllably sample from. In this work, we study constrained sampling from such language models: generating text that satisfies user-defined constraints, while maintaining fluency and the model's performance in a downstream task. We propose MuCoLa -- a sampling procedure that combines the log-likelihood of the language model with arbitrary (differentiable) constraints in a single energy function, and then generates samples in a non-autoregressive manner. Specifically, it initializes the entire output sequence with noise and follows a Markov chain defined by Langevin Dynamics using the gradients of the energy function. We evaluate MuCoLa on text generation with soft and hard constraints as well as their combinations obtaining significant improvements over competitive baselines for toxicity avoidance, sentiment control, and keyword-guided generation.
translated by 谷歌翻译
随着人工智能系统变得越来越强大和普遍,人们对机器的道德或缺乏道德的关注变得越来越关注。然而,向机器讲授道德是一项艰巨的任务,因为道德仍然是人类中最激烈的争论问题之一,更不用说AI了。但是,部署到数百万用户的现有AI系统已经在做出充满道德影响的决策,这构成了一个看似不可能的挑战:教学机器的道德意义,而人类继续努力努力。为了探索这一挑战,我们介绍了Delphi,这是一个基于深层神经网络的实验框架,直接训练了描述性道德判断,例如,“帮助朋友”通常是不错的,而“帮助朋友传播假新闻”不是。经验结果提供了对机器伦理的承诺和局限性的新见解。面对新的道德情况,德尔菲(Delphi)表现出强大的概括能力,而现成的神经网络模型表现出明显差的判断,包括不公正的偏见,证实了对明确教学机器的道德意义的必要性。然而,德尔菲并不完美,表现出对普遍性偏见和不一致的敏感性。尽管如此,我们还是展示了不完美的Delphi的积极用例,包括在其他不完美的AI系统中将其用作组件模型。重要的是,我们根据著名的道德理论来解释Delphi的运营化,这使我们提出了重要的未来研究问题。
translated by 谷歌翻译
Simulating rigid collisions among arbitrary shapes is notoriously difficult due to complex geometry and the strong non-linearity of the interactions. While graph neural network (GNN)-based models are effective at learning to simulate complex physical dynamics, such as fluids, cloth and articulated bodies, they have been less effective and efficient on rigid-body physics, except with very simple shapes. Existing methods that model collisions through the meshes' nodes are often inaccurate because they struggle when collisions occur on faces far from nodes. Alternative approaches that represent the geometry densely with many particles are prohibitively expensive for complex shapes. Here we introduce the Face Interaction Graph Network (FIGNet) which extends beyond GNN-based methods, and computes interactions between mesh faces, rather than nodes. Compared to learned node- and particle-based methods, FIGNet is around 4x more accurate in simulating complex shape interactions, while also 8x more computationally efficient on sparse, rigid meshes. Moreover, FIGNet can learn frictional dynamics directly from real-world data, and can be more accurate than analytical solvers given modest amounts of training data. FIGNet represents a key step forward in one of the few remaining physical domains which have seen little competition from learned simulators, and offers allied fields such as robotics, graphics and mechanical design a new tool for simulation and model-based planning.
translated by 谷歌翻译
In computer-aided drug discovery (CADD), virtual screening (VS) is used for identifying the drug candidates that are most likely to bind to a molecular target in a large library of compounds. Most VS methods to date have focused on using canonical compound representations (e.g., SMILES strings, Morgan fingerprints) or generating alternative fingerprints of the compounds by training progressively more complex variational autoencoders (VAEs) and graph neural networks (GNNs). Although VAEs and GNNs led to significant improvements in VS performance, these methods suffer from reduced performance when scaling to large virtual compound datasets. The performance of these methods has shown only incremental improvements in the past few years. To address this problem, we developed a novel method using multiparameter persistence (MP) homology that produces topological fingerprints of the compounds as multidimensional vectors. Our primary contribution is framing the VS process as a new topology-based graph ranking problem by partitioning a compound into chemical substructures informed by the periodic properties of its atoms and extracting their persistent homology features at multiple resolution levels. We show that the margin loss fine-tuning of pretrained Triplet networks attains highly competitive results in differentiating between compounds in the embedding space and ranking their likelihood of becoming effective drug candidates. We further establish theoretical guarantees for the stability properties of our proposed MP signatures, and demonstrate that our models, enhanced by the MP signatures, outperform state-of-the-art methods on benchmark datasets by a wide and highly statistically significant margin (e.g., 93% gain for Cleves-Jain and 54% gain for DUD-E Diverse dataset).
translated by 谷歌翻译
神经算法推理的基石是解决算法任务的能力,尤其是以一种概括分布的方式。尽管近年来,该领域的方法学改进激增,但它们主要集中在建立专家模型上。专业模型能够学习仅执行一种算法或具有相同控制流骨干的算法的集合。相反,在这里,我们专注于构建通才神经算法学习者 - 单个图形神经网络处理器,能够学习执行各种算法,例如分类,搜索,动态编程,路径触发和几何学。我们利用CLRS基准来凭经验表明,就像在感知领域的最新成功一样,通才算法学习者可以通过“合并”知识来构建。也就是说,只要我们能够在单任务制度中学习很好地执行它们,就可以以多任务的方式有效地学习算法。在此激励的基础上,我们为CLR提供了一系列改进,对CLR的输入表示,培训制度和处理器体系结构,将平均单任务性能提高了20%以上。然后,我们进行了多任务学习者的彻底消融,以利用这些改进。我们的结果表明,一位通才学习者有效地结合了专家模型所捕获的知识。
translated by 谷歌翻译
生长免费的在线3D形状集合决定了3D检索的研究。然而,已经进行了积极的辩论(i)最佳输入方式是触发检索,以及(ii)这种检索的最终用法场景。在本文中,我们为回答这些问题提供了不同的观点 - 我们研究了3D草图作为输入方式,并提倡进行检索的VR-Scenario。因此,最终的愿景是用户可以通过在VR环境中自由空气供电来自由地检索3D模型。作为新的3D VR-Sketch的首次刺入3D形状检索问题,我们做出了四个贡献。首先,我们对VR实用程序进行编码以收集3D VR-Sketches并进行检索。其次,我们从ModelNet收集了两个形状类别的第一套$ 167 $ 3D VR-SKETCHES。第三,我们提出了一种新的方法,以生成不同抽象级别类似人类的3D草图的合成数据集,以训练深层网络。最后,我们比较了常见的多视图和体积方法:我们表明,与3D形状到3D形状检索相比,基于体积点的方法在3D草图上表现出卓越的性能,并且由于稀疏和抽象的性质而显示出3D形状的检索3D VR-Sketches。我们认为,这些贡献将集体成为未来在此问题的尝试的推动者。 VR接口,代码和数据集可在https://tinyurl.com/3dsketch3dv上找到。
translated by 谷歌翻译
我们介绍了1,497个3D VR草图和具有较大形状多样性的椅子类别的3D形状对的第一个细粒数据集。我们的数据集支持草图社区的最新趋势,以细粒度的数据分析,并将其扩展到主动开发的3D域。我们争辩说最方便的草图场景,其中草图由稀疏的线条组成,并且不需要任何草图技能,事先培训或耗时的准确绘图。然后,我们首次将细粒度3D VR草图的场景研究为3D形状检索,作为一种新颖的VR素描应用程序和一个探索基础,以推动通用见解以告知未来的研究。通过实验在这个新问题上精心选择的设计因素组合,我们得出重要的结论以帮助跟进工作。我们希望我们的数据集能够启用其他新颖的应用程序,尤其是那些需要细粒角的应用程序,例如细粒度的3D形状重建。该数据集可在tinyurl.com/vrsketch3dv21上获得。
translated by 谷歌翻译
我们研究基于3D-VR-Sketch的细粒度3D形状检索的实际任务。此任务特别令人感兴趣,因为2D草图被证明是2D图像的有效查询。但是,由于域间隙,很难从2D草图中以3D形状的检索获得强劲的性能。最近的工作证明了3D VR素描在此任务上的优势。在我们的工作中,我们专注于3D VR草图中固有的不准确性造成的挑战。我们观察到,带有固定边缘值的三胞胎损失获得的检索结果,通常用于检索任务,包含许多无关的形状,通常只有一个或几个或几个具有与查询相似的结构。为了减轻此问题,我们首次在自适应边距值和形状相似性之间建立联系。特别是,我们建议使用由“拟合差距”驱动的自适应边距值的三重损失,这是在结构保护变形下的两个形状的相似性。我们还进行了一项用户研究,该研究确认这种拟合差距确实是评估形状结构相似性的合适标准。此外,我们介绍了202个VR草图的数据集,用于从内存而不是观察到的202个3D形状。代码和数据可在https://github.com/rowl1ng/structure-aware-aware-vr-sketch-shape-retrieval中找到。
translated by 谷歌翻译